184 research outputs found

    Paradigm of tunable clustering using binarization of consensus partition matrices (Bi-CoPaM) for gene discovery

    Get PDF
    Copyright @ 2013 Abu-Jamous et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.National Institute for Health Researc

    Difference-based clustering of short time-course microarray data with replicates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There are some limitations associated with conventional clustering methods for short time-course gene expression data. The current algorithms require prior domain knowledge and do not incorporate information from replicates. Moreover, the results are not always easy to interpret biologically.</p> <p>Results</p> <p>We propose a novel algorithm for identifying a subset of genes sharing a significant temporal expression pattern when replicates are used. Our algorithm requires no prior knowledge, instead relying on an observed statistic which is based on the first and second order differences between adjacent time-points. Here, a pattern is predefined as the sequence of symbols indicating direction and the rate of change between time-points, and each gene is assigned to a cluster whose members share a similar pattern. We evaluated the performance of our algorithm to those of K-means, Self-Organizing Map and the Short Time-series Expression Miner methods.</p> <p>Conclusions</p> <p>Assessments using simulated and real data show that our method outperformed aforementioned algorithms. Our approach is an appropriate solution for clustering short time-course microarray data with replicates.</p

    Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes

    Get PDF
    BACKGROUND: A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. In most cases, a post hoc analysis is done to see if the genes in the same clusters can be functionally correlated. While past successes of such analyses have often been reported in a number of microarray studies (most of which used the standard hierarchical clustering, UPGMA, with one minus the Pearson's correlation coefficient as a measure of dissimilarity), often times such groupings could be misleading. More importantly, a systematic evaluation of the entire set of clusters produced by such unsupervised procedures is necessary since they also contain genes that are seemingly unrelated or may have more than one common function. Here we quantify the performance of a given unsupervised clustering algorithm applied to a given microarray study in terms of its ability to produce biologically meaningful clusters using a reference set of functional classes. Such a reference set may come from prior biological knowledge specific to a microarray study or may be formed using the growing databases of gene ontologies (GO) for the annotated genes of the relevant species. RESULTS: In this paper, we introduce two performance measures for evaluating the results of a clustering algorithm in its ability to produce biologically meaningful clusters. The first measure is a biological homogeneity index (BHI). As the name suggests, it is a measure of how biologically homogeneous the clusters are. This can be used to quantify the performance of a given clustering algorithm such as UPGMA in grouping genes for a particular data set and also for comparing the performance of a number of competing clustering algorithms applied to the same data set. The second performance measure is called a biological stability index (BSI). For a given clustering algorithm and an expression data set, it measures the consistency of the clustering algorithm's ability to produce biologically meaningful clusters when applied repeatedly to similar data sets. A good clustering algorithm should have high BHI and moderate to high BSI. We evaluated the performance of ten well known clustering algorithms on two gene expression data sets and identified the optimal algorithm in each case. The first data set deals with SAGE profiles of differentially expressed tags between normal and ductal carcinoma in situ samples of breast cancer patients. The second data set contains the expression profiles over time of positively expressed genes (ORF's) during sporulation of budding yeast. Two separate choices of the functional classes were used for this data set and the results were compared for consistency. CONCLUSION: Functional information of annotated genes available from various GO databases mined using ontology tools can be used to systematically judge the results of an unsupervised clustering algorithm as applied to a gene expression data set in clustering genes. This information could be used to select the right algorithm from a class of clustering algorithms for the given data set

    Application of machine learning methods to histone methylation ChIP-Seq data reveals H4R3me2 globally represses gene expression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the last decade, biochemical studies have revealed that epigenetic modifications including histone modifications, histone variants and DNA methylation form a complex network that regulate the state of chromatin and processes that depend on it including transcription and DNA replication. Currently, a large number of these epigenetic modifications are being mapped in a variety of cell lines at different stages of development using high throughput sequencing by members of the ENCODE consortium, the NIH Roadmap Epigenomics Program and the Human Epigenome Project. An extremely promising and underexplored area of research is the application of machine learning methods, which are designed to construct predictive network models, to these large-scale epigenomic data sets.</p> <p>Results</p> <p>Using a ChIP-Seq data set of 20 histone lysine and arginine methylations and histone variant H2A.Z in human CD4<sup>+ </sup>T-cells, we built predictive models of gene expression as a function of histone modification/variant levels using Multilinear (ML) Regression and Multivariate Adaptive Regression Splines (MARS). Along with extensive crosstalk among the 20 histone methylations, we found H4R3me2 was the most and second most globally repressive histone methylation among the 20 studied in the ML and MARS models, respectively. In support of our finding, a number of experimental studies show that PRMT5-catalyzed symmetric dimethylation of H4R3 is associated with repression of gene expression. This includes a recent study, which demonstrated that H4R3me2 is required for DNMT3A-mediated DNA methylation--a known global repressor of gene expression.</p> <p>Conclusion</p> <p>In stark contrast to univariate analysis of the relationship between H4R3me2 and gene expression levels, our study showed that the regulatory role of some modifications like H4R3me2 is masked by confounding variables, but can be elucidated by multivariate/systems-level approaches.</p

    Brain perfusion imaging with voxel-based analysis in secondary progressive multiple sclerosis patients with a moderate to severe stage of disease: a boon for the workforce

    Get PDF
    Background: The present study was carried out to evaluate cerebral perfusion in multiple sclerosis (MS) patients with a moderate to severe stage of disease. Some patients underwent hyperbaric oxygen therapy (HBOT) and brain perfusion between before and after that was compared. Methods: We retrospectively reviewed 25 secondary progressive (SP)-MS patients from the hospital database. Neurological disability evaluated by Expanded Disability Status Scale Score (EDSS). Brain perfusion was performed by (99 m) Tc-labeled bicisate (ECD) brain SPECT and the data were compared using statistical parametric mapping (SPM). In total, 16 patients underwent HBOT. Before HBOT and at the end of 20 sessions of oxygen treatment, 99mTc-ECD brain perfusion single photon emission computed tomography (SPECT) was performed again then the results were evaluated and compared. Brain perfusion was performed by (99 m) Tc-labeled bicisate (ECD) brain SPECT and the data were compared using statistical parametric mapping (SPM). Results: A total of 25 SP-MS patients, 14 females (56 %) and 11 males (44 %) with a mean age of 38.92 ± 11. 28 years included in the study. The mean disease duration was 8.70 ± 5.30 years. Of the 25 patients, 2 (8 %) had a normal SPECT and 23 (92 %) had abnormal brain perfusion SPECT studies. The study showed a significant association between severity of perfusion impairment with disease duration and also with EDSS (P <0.05). There was a significant improvement in pre- and post-treatment perfusion scans (P <0.05), but this did not demonstrate a significant improvement in the clinical subjective and objective evaluation of patients (P >0.05). Conclusions: This study depicted decreased cerebral perfusion in SP-MS patients with a moderate to severe disability score and its association with clinical parameters. Because of its accessibility, rather low price, practical ease, and being objective quantitative information, brain perfusion SPECT can be complementing to other diagnostic modalities such as MRI and clinical examinations in disease surveillance and monitoring. The literature on this important issue is extremely scarce, and follow up studies are required to assess these preliminary results

    Measurement of the Forward-Backward Asymmetry in the B -> K(*) mu+ mu- Decay and First Observation of the Bs -> phi mu+ mu- Decay

    Get PDF
    We reconstruct the rare decays B+K+μ+μB^+ \to K^+\mu^+\mu^-, B0K(892)0μ+μB^0 \to K^{*}(892)^0\mu^+\mu^-, and Bs0ϕ(1020)μ+μB^0_s \to \phi(1020)\mu^+\mu^- in a data sample corresponding to 4.4fb14.4 {\rm fb^{-1}} collected in ppˉp\bar{p} collisions at s=1.96TeV\sqrt{s}=1.96 {\rm TeV} by the CDF II detector at the Fermilab Tevatron Collider. Using 121±16121 \pm 16 B+K+μ+μB^+ \to K^+\mu^+\mu^- and 101±12101 \pm 12 B0K0μ+μB^0 \to K^{*0}\mu^+\mu^- decays we report the branching ratios. In addition, we report the measurement of the differential branching ratio and the muon forward-backward asymmetry in the B+B^+ and B0B^0 decay modes, and the K0K^{*0} longitudinal polarization in the B0B^0 decay mode with respect to the squared dimuon mass. These are consistent with the theoretical prediction from the standard model, and most recent determinations from other experiments and of comparable accuracy. We also report the first observation of the Bs0ϕμ+μdecayandmeasureitsbranchingratioB^0_s \to \phi\mu^+\mu^- decay and measure its branching ratio {\mathcal{B}}(B^0_s \to \phi\mu^+\mu^-) = [1.44 \pm 0.33 \pm 0.46] \times 10^{-6}using using 27 \pm 6signalevents.Thisiscurrentlythemostrare signal events. This is currently the most rare B^0_s$ decay observed.Comment: 7 pages, 2 figures, 3 tables. Submitted to Phys. Rev. Let

    Search for a New Heavy Gauge Boson Wprime with Electron + missing ET Event Signature in ppbar collisions at sqrt(s)=1.96 TeV

    Get PDF
    We present a search for a new heavy charged vector boson WW^\prime decaying to an electron-neutrino pair in ppˉp\bar{p} collisions at a center-of-mass energy of 1.96\unit{TeV}. The data were collected with the CDF II detector and correspond to an integrated luminosity of 5.3\unit{fb}^{-1}. No significant excess above the standard model expectation is observed and we set upper limits on σB(Weν)\sigma\cdot{\cal B}(W^\prime\to e\nu). Assuming standard model couplings to fermions and the neutrino from the WW^\prime boson decay to be light, we exclude a WW^\prime boson with mass less than 1.12\unit{TeV/}c^2 at the 95\unit{%} confidence level.Comment: 7 pages, 2 figures Submitted to PR

    Measurements of the properties of Lambda_c(2595), Lambda_c(2625), Sigma_c(2455), and Sigma_c(2520) baryons

    Get PDF
    We report measurements of the resonance properties of Lambda_c(2595)+ and Lambda_c(2625)+ baryons in their decays to Lambda_c+ pi+ pi- as well as Sigma_c(2455)++,0 and Sigma_c(2520)++,0 baryons in their decays to Lambda_c+ pi+/- final states. These measurements are performed using data corresponding to 5.2/fb of integrated luminosity from ppbar collisions at sqrt(s) = 1.96 TeV, collected with the CDF II detector at the Fermilab Tevatron. Exploiting the largest available charmed baryon sample, we measure masses and decay widths with uncertainties comparable to the world averages for Sigma_c states, and significantly smaller uncertainties than the world averages for excited Lambda_c+ states.Comment: added one reference and one table, changed order of figures, 17 pages, 15 figure

    A transversal approach to predict gene product networks from ontology-based similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression.</p> <p>Results</p> <p>The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity.</p> <p>Conclusion</p> <p>Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression.</p
    corecore